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Abstract. Given a metric space with a Borel probability measure, for each 
integer A' we obtain a probabihty distribution on N X N distance matrices 
by considering the distances between pairs of points in a sample consisting 
of A'^ points chosen indepenedently from the metric space with respect to the 
given measure. We show that this gives an asymptotically bi-Lipschitz relation 
between metric measure spaces and the corresponding distance matrices. This 
is an effective version of a result of Vershik that metric measure spaces are 
determined by associated distributions on infinite random matrices. 



1. Introduction 

Let {X, d) be a metric space and let ^ be a Borel probability measure on X (we 
shall henceforth refer to {X, d, /i) as a metric measure space) . Consider a sequence 
{a;„}„gN of random points in X chosen independently according to the probability 
measure /i. We obtain a random matrix D = (dij) = (d{xi,Xj)) with rows and 
columns indexed by the positive integers N. Thus, the triple {X, d, fi) gives rise to 
a distribution on random matrices with rows and columns indexed by N (we shall 
call these infinite square matrices). 

This work is motivated by a result of Vershik ^ that the metric measure space 
{X, d, fi) is determined, up to measure preserving isometrics, by the corresponding 
distribution on infinite square matrices. Our goal is to give an effective version 
of this result for distributions on matrices obtained by choosing a finite (but suf- 
ficiently large) collection of points. Our result in fact gives a bi-Lispschitz corre- 
spondence. 

Namely, for a positive integer N, we sample N independent points xi, X2,- ■ ■ jIat, 
from a given compact metric space X according to a given probability distribution 
/i. This gives a probability distribution on N x N (symmetric) matrices. We show 
that there is an asymptotically bi-Lipschitz relation between metric measure spaces 
and the corresponding distributions on matrices. Here we use a notion of distance 
on metric measure spaces, which we call the Gromov-Hausdorff-Prokhorov distance^ 
which is a generalisation of the Gromov-Hausdorff distance between metric spaces. 

One expects that if two metric measure spaces are close, then the corresponding 
distributions on matrices are close for all N (not just large N). We also prove such 
a result, but in this case the relation is Holder with optimal exponent 1/2. Finally, 
in Section [8] we introduce a quantity, which we call the relative entropy of metric 
measure spaces, and conjecture a large deviations result. 
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We remark that the distributions on matrices are naturally related to metric 
notions of curvature. In particular, CAT(k) spaces and, more generally, Wirtinger 
spaces ,3i, can be viewed as defined in terms of the support of distribution on 
matrices. On the other hand, the Ricci curvature is related to the measure of 
cones, which can be related to distance matrices for collections of points with one 
point fixed and the others chosen at random. It would be interesting to study the 
continuity of optimal transport with respect to the Gromov-Hausdorff-Prokhorov 
distance. 

2. The distance between metric measure spaces 

In this section, we recall the definitions of a metric on probability measures on 
a given metric space and the Gromov-Hausdorff distance between compact metric 
spaces. We then introduce a notion of distance between metric measure spaces, 
which is a combination of these definitions. We show that our definition indeed 
gives a metric when we consider metric spaces with measures having full support. 
Further, we prove that an analogue of Gromov's compactness theorem holds for 
metric measure spaces. 

2.1. Distance between distributions. Let /ii and /i2 be Borel probability mea- 
sures on a given metric space (Z, dz). Let tt^ : Z x Z ^ Z , i = 1, 2 be the projection 
maps. Consider probability measures 9 on Z x Z so that the marginal distributions 
satisfy TTi^{9) = /i^ for i = 1,2. For such a measure 9, we define A{9) — A{9;dz) 

by 

A{9) ^ inf{r > ; 6'({(zi, za) e Z x Z : 22) < r}) > 1 - r}. 

We define the Levy-Prokhorov distance between /ii and fi2 to be the infimum, 
dp(fii, 112) — inf{A(0) : 9 measure on Z x Z, 7r,,(0) — fj,i,i = 1,2}. 

We shall sometimes denote this as dp{^i, ^2] Z) or dp{fii, fi2', dz) to clarify the 
underlying metric space. 

Note that the distance dp has another equivalent formulation. Namely, we con- 
sider random variables Xi(uj) G Z, i = 1,2, on a sample space n with probability 
measure P, so that the distributions of Xi(uj) is Hi for i — 1,2. For such a pair 
{Xi, X2) of random variables, we consider 

A{Xi,X2) = inf{r > : P{{uj e Q : dz{Xi{uj), X2{uj)) < r}) > 1 - r}. 

Then A{9) is the infimum of A{Xi,X2) over all pairs {Xi,X2) of random vari- 
ables so that the marginal distribution of Xi is for i = 1,2 (we can see by 
considering the pushforward of the measure P on ft to Z x Z using the map 

(XiH,X2H)). 

2.2. Gromov-Hausdorff distance. Let {Xi,di) and {X2,d2) be compact metric 
spaces. Consider pairs of isometric embeddings Li : Xi — >■ Z, i = 1,2 of Xi and 
X2 into a metric space Z. For such embeddings, we can consider the Hausdorff 
distance dn between Li{Xi) and L2{X2) (as subsets of Z). 

The Gromov-Hausdorff distance [2] between Xi and X2 is defined as the infimum 
of such Hausdorff distances, i.e., 

dcH{Xi,X2) = inf{d//(ti(Xi), i2{X2)) : li : Xi ^ Z isometric embeddings} 

where Z in the infimum varies over all (compact) metric spaces. 
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2.3. Distance between metric measure spaces. Wc now define the Gromov- 
Hausdorff-Prokhorov distance between a pair of metric measure spaces (X;, di,fj,i), 
i = 1,2, with the underlying metric space Xi assumed to be compact. Consider 
isometric embeddings : — i = 1, 2, of the spaces Xi into a metric space 
Z. These give rise to pushforward probabihty measures on Z. The distance 
between the metric measure spaces is the infimum of the distance between the 
pushforward probabihty measures over all isometric embeddings, i.e., 

dGHp{Xi,X2) = m^{dp{i^{^i^),i2{^^2)) : Lit. : X^ ^ Z isometric embeddings} 

where Z in the infimum varies over all (compact) metric spaces. 

We can identify the spaces Xi with their images in Z . Further, we can assume 
that, under these identifications, Z = X^l^ X2. We shall often make such identifi- 
cations and identify the measures with the corresponding pushforward measures 
on Z . Further, we often suppress the measures from the notation if they are clear 
from the context. 

It is clear that the distance is symmetric. Wc; shall provc^ the triangle inequality, 
showing that we get a pseudo-metric. We also show an appropriate positivity result, 
showing in particular that we get a genuine metric on metric measure spaces for 
which the measure has full siipport. 

Wc remark that the definition of the Gromov-Hausdorff-Prokhorov distance 
works even for pseudo-metric spaces, i.e., where we allow d{x, y) = even if x ^ y. 
All our results also hold in this case. 

2.4. The triangle inequality. Let {Xi,iJ.i), {X^,^.^) and {Y,u) be metric, mea- 
sure spaces. 

Proposition 2.1. We have 

dGHp{Xi,X2) < dGHpiXuY) + dGHp{X2,Y). 

Proof. Let e > be arbitrary. By definition, for i = 1,2, we can find spaces 
Zi = XiUY so that the distance between (the pushforwards of) the measures fii 
and is at most dGHp{Xi,Y) + e. 

Now, let Z be the metric space obtained from Zi ]J Z2 by identifying the iso- 
metric copies of Y, and with distance the maximal metric whose restriction to each 
Zi is the given metric. More concretely, the metric dz on Z is given by 

(1) If both xi and X2 lie in some Zi, dz{xi,X2) — dz^ixi, X2) 

(2) If xi e Zi and X2 <E Z2, dz{xi,X2) = \ni{dzi{xi,y) +dz.^{y,X2) -.y^Y}. 

It is easy to see that dz defined as above gives a metric whose restriction to each 
Zi is the given metric on Zi. The triangle inequality implies that this is indeed the 
maximal such metric. 

The measures iii and v push forward to give measures on Z, with the the distance 
between the pushforwards in Z of the measures jii and u at most dcHpiXi, Y) + e 
for i = 1, 2. By the triangle inequality for dp, it follows that 

dGHp{Xi,X2) < rfp(Mi,M2) < dGHp{Xi,Y)+dGHp{X2,Y) + 2e. 



As e > was arbitrary, the result follows. 



□ 
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2.5. Positivity of the distance function. As the definition of dcHP ignores, for 
example, isolated points that have measure (which is to be expected for stochastic 
concepts), we do not expect dGHp{X,Y) = to imply that X = Y, but only that 
this is true up to ignoring an appropriate class of sets with measure 0. We now 
prove such a result. 

Theorem 2.2. For two metric, measure spaces {X, ii) and {Y, v), dQHp{X, Y) ~ Q 
if and only if there are open sets U G X and V C Y of measure so that there is 
a measure preserving isometry between X \ U and Y \ V . 

Proof. Given a measure preserving isometry between X \ U and Y \ V, with U 
and V open sets with measure zero, we take Z to be the space obtained from 
X ]J y by identifying X\U and Y\V using the given measure preserving isometry, 
with the metric on Z the maximal metric whose restrictions to X and Y are the 
given metrics (analogous to the one constructed in Proposition 12. ip . Then the 
pushforward measures on Z under the inclusion maps are equal. It follows that 

dGHpiX,Y)=0. 

Conversely, by hypothesis, there is a sequence of metric spaces Zi = X U Y so 
that the distance between the pushforwards in Zi of the measures /i and v converges 
to 0. We shall first construct a limit of these spaces. 

It is easy to see that there is a uniform bound on the diameter of the spaces Zi. 
Further, as Zi = XUY, given e > there is an integer N = N{e), independent of i, 
so that there is a finite set Fi C Zi with cardinality at most N so that the each point 
in Zi has distance at most e from Fi. Hence, by Gromov's compactness theorem for 
metric spaces, on passing to a subsequence, Zi converges in the Gromov-Hausdorff 
metric to a compact metric space Z. 

By an equivalent definition of the Gromov-Hausdorff distance, we have a se- 
quence — > and maps (pi : Zi ^ Z so that 

\d{ip^{p),ip^{q)) - d{p,q)\ < e^, Vp, g S Zi. 

Hence, on composing with the maps from X and Y to Zi , we get maps fi'.X^Z 
and gi :Y ^ Z that satisfy the analogous conditions 

\d{f^{p)J^{q)) - d{p,q)\ < e„ \fp,q e X, 

and 

\d{9i{p),9i{q)) - d{p,q)\ < e,, Vp, q £ Y. 

As in the proof of the Arzela-Ascoli theorem, we can pass to a subsequence to 
obtain limiting maps f : X ^ Z and g :Y ^ Z that are isometric embeddings. 

Further, we have measures 9i on Zi x Zi with marginals the pushforwards of 
the measures /i and v to Zi, so that A(6'i) — > 0. On passing to a subsequence, 
the pushforwards to Z x Z of the measures 9i converge to a measure 9 on Z, with 
marginals of 9 the pushforwards of the measures and v and with A(0) = 0. It 
follows that the pushforward measures coincide. 

In particular, identifying X and Y with their images, the measures are supported 
on X r\Y (X n F is defined by viewing X and Y as subsets of Z). Hence, if 
U = X\{X r\Y) and V ~ Y \ {X i^Y) , U and V are open sets of zero measure in 
X and Y, respectively (as X CiY is compact, hence closed in X and Y), and there 
is a measure preserving isometry between X \ U and Y \ V. 

□ 
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In particular if we restrict to metric nieasure spaces with the nieasure having 
full support, doHP is a genuine metric. 

2.6. A compactness theorem. We shall show that an analogue of Gromov's 
compactness theorem holds for metric measure spaces. 

Theorem 2.3. Let {Xn,dn, Hn) be a sequence of compact metric measure spaces 
so that 

(1) there is a uniform bound D > on the diameter of the spaces Xn- 

(2) for each e > there is an integer N{e) so that, for each n €N, Xn contains 
an e-net with cardinality at most N{e). 

Then there is a subsequence X„. that converges in the Gromov-Hausdorff-Prokhorov 
metric. 

Proof. By Gromov's compactness theorem for metric spaces, on passing to a sub- 
sequence, we can ensure that the metric spaces {Xn, dn) converge to a metric space 
{Z,dz). By an equivalent formulation of the Gromov-Hausdorff distance, we can 
choose the subsequence so that there are functions /„ : Xn — > Z so that for all n, 

(1) \dz{fn{x),fn(y))-dnix,y)\ < ^. 

Consider the pushforward measures rjn = fn*{tJ'n) on Z. We first bound the dis- 
tance between {Xn,dn, fJ^n) and {Z,dz,r]n). Namely, consider the maximal metric 
dw onW = Xn ]JZ so that 

(1) if Xi,X2 e Xn, dw{xi,X2) < dn{xi,X2). 

(2) if Xi,X2 G Z, dw{xi,X2) < dz{xi,X2). 

(3) ifxGXn, d{x,fn{x)) < 1/n. 

Concretely, we consider : W x W ^ M. to he the symmetric function defined 
by: 

(1) if Xl,X2 e Xn, d\^{xl,X2) = dn{Xl,X2). 

(2) if xi,X2 G Z, d'^,{xl,X2) = dz{xi,X2). 

(3) if xi G Xn and X2 G Z , then 

d'y^{xi,X2) = inf{rf(a;i,x) + i + dif{x),X2) : x G X„}. 

The function d'^y satisfies the triangle inequality and hence gives a metric on W. 
Further, rf^ satisfies the conditions required by dw By the triangle inequality we 
can see that any metric satisfying the conditions required for dw is bounded above 
by d^Y, so dw = d'yy by maximality. 

In particular the inclusion maps from X„ and Z are isometric embeddings in 
Z„ (as they are for the metric d\y by construction). Let i^n be the measure on 
the product Zn x Zn obtaining by pushing forward the measure fin using the map 
X I-)- {x,fn{x)). Then the marginals of i/„ are /u„ and rjn and A(z/„) < 1/n. This 

shows that dGHpiiXn,dn, fin), {^^^z ,r]n)) < !/"-• 

As the measures rjn are all Borel probability measures, on passing to a subse- 
quence these converge to a measure rj. We can thus assume that dp{r]n,rj) < 1/n, 
and hence dGHp{{Z, dz, r]n), {Z, dz,ri)) < 1/n. By the triangle inequality, it follows 
that dGHp{{Xn,dn,fin),{Z,dz,v)) < 

□ 
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3. Distance between square matrices 

To make uniform statements, it will be convenient to introduce a modified dis- 
tance function on the space M{N) oi N x N symmetric square matrices by allow- 
ing some rows and corresponding columns to be excluded. Namely, for matrices 
A = {ttij) and B = ipij), we let 

dM{A, B) = inf{p > : 3A C {1, 2, . . . , Af}, |A| < Np, (i(ay , ) < p ^ X} 

Note that this is a pseudo- metric, but we can use pseudo-metrics in place of 

metrics in most constructions. Further, note that for any fixed A'^ this coincides 
with the supremum metric for a pair of matrices A, B with dM{A, B) < 1/N. 

The permutation group Sjv on N letters acts on M{N) by simultaneous permu- 
tations of rows and columns. We have a corresponding distance d-^, with matrices 
close in if they arc close in the previous sense up to permutation. More precisely, 

B) = -nmi{dM{A, nB) : tt £ Sn}- 

These induce distances on the space of distributions on M{N). We observe 
that these induced distances coincide for distributions that are invariant under the 
symmetric group. Let fXi and 112 be distributions on M{N) that are invariant under 
the action of Sn- We remark that the following proposition holds (with the same 
proof) for more general group actions. 

Proposition 3.1. For measures p,i and fj,2 invariant under the action of Sn, 

dp{p,i,p,2;dM) = dp{n\,H2,dT,). 

Proof. As dui-, •) > rf7r(-, ■)^ it follows that rfp(/ii, /X2; c^m) > rfp(Mi) M2; c^tt)- 
The converse follows from the following lemma. 

Lemma 3.2. Given measures jii and 122 on M{N) that are invariant under the 
action of S{N) and a measure v on M{N) x M{N) with marginals p,i, there is a 
measure u' with marginals /Xj so that A(i/';rf7r) = A(zv;dM)- 

Proof. Given matrices A and B, there is an element a G Sm so that we have 
dM{A, aB) = d„{A, B). Let V : M{N) x M{N) M{N) x M{N) be a measurable 
function that associates to {A,B) a pair [A,aB) so that d{A,aB) = dT^{A,B). 
We define v" = ip^iiy), i.e., the pushforward of v under the map t/j, and let u' be 
obtained from v" by averaging with respect to the diagonal action of Sn- 

By construction, if /x^ and p," are the marginals of v' and v", then for an orbit 
SnA of a matrix A we have pi{SNA) = p''{SnA) = p^{SnA) (as the constructions 
of v" from ly and v' from v" leave the marginal measure of an orbit unchanged). 
Further, by construction the marginals /U- are invariant under the action of S{N); 
by hypothesis, the measures pi are also invariant under this action. It follows that 
p'^ = Pi, so Pi are the marginals of v' . By construction A(i^'; d^) — A(z/; (Jm)- D 

Now, let be a measure on M{N) x M{N) with marginals pi satisfying 

A(i/;d^) < dp{pi,p2;d^) + e. 

By the above lemma there is a measure 1/' with marginals pi so that we have 
A(z/'; c^m) = A(i/; dj^). This implies that 



dp{pi,p2]dM) < dp{pi,p2;d^) + e. 
As e > is arbitrary, the claim follows. 



□ 



METRIC MEASURE SPACES AND DISTANCE MATRICES 



7 



4. Finite metric spaces and Distance matrices 

A distance matrix is a real symmetric matrix A — {aij ) with non- negative entries 
and zeroes on the diagonal which satisfies the triangle inequality 

dij < aik + Ukj 

for all i, j and k. Let B(iV) be the subset of M{N) consisting of distance matrices. 
We shall consider the space D(7V) with metrics obtained by restricting and 

Let X = {xi, . . . ,Xn} be a finite set with a pseudo-metric d. We can associate 
to X a distance matrix A = (aij), with aij — d{xi,Xj). 

Conversely, there is a natural map Q from ©(A^) to pseudo-metric spaces carrying 
Borel probability measures. Namely, we associate to A= {aij) the space 8(A) with 
points Xi, 1 < i < N , corresponding to the rows (equivalently the columns) of A 
and the distance given by d(xi,Xj) = aij. The measure of each singleton set {xi} 
is defined to be 1/iV. This construction gives a pseudo- metric and measure. We 
remark that we can get a metric space by identifying points whose distance is zero 
and pushing forward the measure under the corresponding quotient map. 

We show that O is a bi-Lipschitz map, with Lipschitz constant 2. 

Theorem 4.1. For A,B E D{N), we have 

dGHp{Q{A),Q{B)) < d^{A, B) < 2dGHp{Q{A), Q{B)). 

Proof. Suppose A = (oij) and B — (bij) are distance matrices in D{N). The spaces 
X = Q{A) and Y — Q{B) have N points, which we denote xi, X2, . . . , xn and j/i, 
2/2,. •• 7 UN with dx{xi,Xj) = aij and dyiui, Uj) = bij. The measures on the spaces 
X and Y assign a measure of 1/iV to each point. 

Suppose djriA, B) < e. By permuting the rows and columns of A and i?, we can 
assume that |a.y — bij\ < e if 1 < i,j < N{1 — e). Let Z = XJJY with metric dz 
the maximal metric so that 

(1) dzixi,Xj) < dx{xi,Xj) for I < i,j < N. 

(2) dzivi.Vj) < dY{yr,yj) for 1 < i,j < N. 

(3) dz{x,,y^) < e for 1 < i,j < N{l~e). 

As in the analogous construction in Theorem l2.31 we see that X and Y isometrically 
embed in Z. Further, by considering the measure on Z x Z which assigns the weight 
1/A^ to points of the form {xi,yi), 1 < i < N , and zero to all other points, it follows 
that dGHpiX,Y) < e. 

Conversely, suppose, for some e > 0, dGHp{X,Y) < e. It follows that there is a 
space Z with subspaces that can be identified with X and Y so that Z — XUY and 
the pushforward measures and /iy satisfy dp{^x, (J-y) < e. Let v he a, measure 
on Z X Z with marginals fix and so that A{iy) < e. We shall consider a bijection 
between the points of X and Y, which will give the necessary permutation of the 
rows and columns of A and B. 

Wc now prove the existence of the desired bijection. A second proof is given 
later. 

Let Xq = {x E X : 3y € Y such that d{x,y) < e}. Then there exist functions 
/ : Xo — > y so that d{x, f{x)) < e for all x 6 Xq. Choose such a function so that 
the cardinality (equivalently the measure) of /{Xq) is maximised. 

Lemma 4.2. For f maximal as above, iiy{Y \ f{Xo)) < Ne. 
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Proof. We define subsets Xi C X and Yi C Y. Namely, for n > 0, we define a 
permissible chain to be a sequence of elements of the form yo, yi, . . . , Xn, yn 
so that 

(1) 2/0 ey\/(Xo), 

(2) d{xi,yi-i) < e, I < i < n, 

(3) yi = f{xi), I <i <n. 

We also allow n = for permissible chains. Observe that condition ([2]) implies 
that Xi G Xq, so f{xi) is defined and condition ([3|) makes sense. 

Let Xi C X and Yi C F be the sets of elements of the form x„ and t/„, re- 
spectively, for permissible chains yo: 2:1, yi, x„, y„. Note that in particular 
Y \ f{Xo) C Yi (by considering chains with n — 0). 

Note that if y £ Yi and x G X satisfies d{x,y) < e, then x G Xi. In particular, 
as y \ /(^o) C li, we have 

(2) r\/(Xo) = ri\/(Xo) = ri\/(Xi). 

We claim that / is injective on Xi. For, if f{p) = f{q) with p E Xi, then there 
is a permissible chain yo, xi, yi, . . . , a;„, with p — Xn- Without loss of generality, 
we can assume that q ^ Xi for 1 < i < n. We define a function 17 : Xq Y hy 
g{xi) = yi-i for 1 < j < n and g{x) — f{x) if x ^ {xi, . . . Observe that we 

have d{x,g{x)) < e for ah x e Xq and g{Xi) = /(Xi) U{yo} (as f{p) = f{q) = g{q) 
is in the image of g). This contradicts the maximality of the image of /. Thus, / 
must be injective on Xi. 

As / : Xi — Yi is injective and hence measure preserving, = /ix(^i), 

hence 

(3) ^lY{Yl)^^lx{x,) + ^,Y{Yl\f{x,)). 

Further, we have, 

(4) ^iY(Yi) - lyiX X Yi) = lyiXi x Fi) + ;/((X \ Xi) x Fi). 

For each pair {x, y) G {X\Xi) xYi, d{x, y) > e. It follows that j/(X x Yi < e. 
Further, j^(Xi x Yi) < /.ixiXi). Substituting these estimates in Equation |4] and 
using Equation [3l we obtain 

(5) ^^x{X,) + tiY{Y, \ f{X,)) = ^lY{Yl) < iixiXi) + e. 
Finally, using Equation [2l we obtain 

(6) ^l{Y\J{XQ))^^,{Y,\f{X,))<e. 

as desired. □ 

It follows that / : Xq — > Y has image with complement having at most Ne 
points. We can replace Xq by a subset to make / injective without changing its 
image. Extending this arbitrarily gives a bijection between points of X and Y so 
that the distances between x and f{x) is at most e for at least (1 — e) points in X . 
By applying permutations, we can assume that d{Xi, Yi) < e for 1 < i < N{1 — e). 
By the triangle inequality we deduce that joy — bij \ < 2e for I < i < N{1 — e). 

Thus, d-,r{A, B) < 2e. As this holds whenever doHp{X, Y) < e, we see that 

d^iA,B) <2dGHpieiA),e{B)), 
as desired. □ 



Now we sketch the promised second proof. 
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Alternate proof of the second inequality in Theorem \4-. 1\ Suppose dcHpiX, Y) < e 
and as before consider a probability measure ly on Z x Z with A(j/) < e and having 
marginals and ^y. Let Vij = v{{xi,yj)}. 

The images of X and Y in Z may be assumed to have N distinct points each, 
so that each of /ix and /xy give mass 1/iV to N distinct points. Then the matrix 
iNPij)^ j^j^ is a doubly stochastic matrix. By Birkoff's theorem, it can be written 
as a convex combination ^CctCT of permutation matrices. Hence, 

As Co- are non-negative and sum to one, it follows that there exists a permutation 
cr such that 

1 ^ _ 1 ^ 

i,i 1=1 

Omit all i such that d(xi,y^(^i^) < e. If i,j are among the remaining iV(l — e) 
indices, then \aij — bij\ < 2e by the triangle inequality. Thus, (i^(^,_B) < 2e as 
required to show. □ 



We remark that it is not true that if two finite metric spaces are close in the 
Gromov-Hausdorff distance, then their distance matrices are close. For example, 
for e > small consider the subspaces of R given by 

X = {-£,0,6,1}, 

and 

y = {0,e,l,l + e). 

As the Hausdorff distance between these, as subsets of M, is e, it follows that 
dGH{X,Y) < €. On the other hand, for e small, the distance between the corre- 
sponding distance matrices is at least 1/4, as if we exclude less than 1 / 4th of the 
rows, i.e., no row, then the distance between some pair of corresponding entries of 
the two matrices is greater than 1/2. 

It would be interesting to understand the relation between Gromov-Hausdorff 
and Gromov-Hausdorff-Prokhorov convergence for Riemannian manifolds with the 
normalised volume measure, especially in the case of collapsing with bounded sec- 
tional curvatures. 



5. Limits of Samples 

Let [X,d,^) be a compact, metric measure space. In this section we show that 
the empirical space from a sufficiently large sample is close to X in the Gromov- 
Hausdorff-Prokhorov metric. 

Definition 5.1. Suppose xi, X2, ■ ■ ■ , Xn are N points in X, the empirical space is 
the pseudo-metric measure space consisting of these points with distance induced 
from X and measure associating equal weight to each sample point. 

Let e > be fixed. As X is compact, there are finitely many points oi,. . . ,a„ in 
X so that each x ^ X satisfies d{x,ai) < e for some a^. Further, we can partition 
X into disjoint measurable sets Ai, 1 < z < n, so that, if a; G Ai, then d{x, Oi) < e. 
Let : X ^ {oi, . . . , a„} be the function that maps each a; G to a^. 
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Consider the metric measure space X consisting of the points {ai\ with distances 
induced from X, and with measure /i given by assigning weight ^{Ai) to the point 
ai. Note that /2 can also be regarded as a measure on X . 

Lemma 5.2. We have dcHpiX , X) < e. 

Proof. Let Z = X and the maps from X and X to Z he the inclusion and the 
identity. We estimate the distance between the pushforward measures, which we 
identify with fl and fi. 

Namely, we consider the measure i' on X x X supported on the union of sets of 
the form {ui} x Ai and so that, for S C Ai, J^({ai} x S) = /i(S'). Then this has 
marginals fl and /i and satisfies A.{i>) < e. □ 

Now consider a sample of N points from X chosen independently according to 
the measure fj,. Let Xn with measure /ijv be the corresponding empirical space. Let 
Xn be the metric measure space with underlying space X and measure 'ipf{fiN). 

Lemma 5.3. We have dcHpiXM tXm) < e. 

Proof. Let Z — X and the maps from the spaces to X be the inclusion maps. We 
estimate the distance between the pushforward measures, which are the measures 
'0*(/^Af) and /j,N regarded as measures on X. Namely, we define a measure v on 
X X X with support points of the form {ai,b), b S Ai Cl Xm, with such a point 
having weight ^N{b). This then has the appropriate marginals (as each point b is 
in a unique set Ai, ■K2*{i') = P^n) and satisfies lS.{v) < e. □ 

We are thus reduced to comparing two distributions on a finite metric space, 
namely X. Let S = {ai, . . . , an} be a finite metric space. Consider two probability 
mass functions {pi} and {qi} on S. 

n n 

Lemma 5.4. dcHpHS, {pi}), (S, {qi})) < 1 - E mm{pi,qi) < J2 Ik - 

i=l i=l 

Proof. We can choose a measure on 5 x S* with marginals (pi) and (qi) so that 
the point has weight at least mhi{pi,qi). This gives the first inequality. The 
second inequality follows from the first as X^iPi = 1- ^ 

Finally, we observe that samples are close to the given distribution on X. 

Lemma 5.5. For N sufficiently large, the probability that daupiX^ , X) > 3e is 
less than e. 

Proof. By the law of large numbers, if pi = ^{Ai), then given a sample X^j with 
N points, qi — ^N{Ai) converges to pi as N ^ oo. Hence, for N sufficiently large, 
P{\Qi — Pi\ > ^/n) < e/n. By Lemma \5A[ 

(7) P{dGHp{XN,X)> e) <e 

Lemmas 15.21 and 15.31 Equation [7] and the triangle inequality show that 

PidGHpiXN,X) > 3e) < e 



as claimed. 



□ 
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6. The Asymptotically Lipschitz correspondence 

We can now show that the correspondence between metric measure spaces and 
distributions on distance matrices is asymptoticaUy bi-Lipschitz. Let {X, d, fj.) be 
a metric measure space. For N G N, let xi, X2t ■ ■ , xn be a sequence of points in 
X sampled independently according to ^. We associate to the random sample xi, 
X2, - ■ ■ , Xn the distance matrix . We denote the distribution of by 6*^. We 
thus have a sequence of distributions 9-^ on the metric spaces ID)(A^) associated to 
X. 

Consider a pair of metric measure spaces X and Y and the corresponding se- 
quence of probability distributions 0^ and 6j^. In all the statements in this and the 
next section, Proposition 13. II shows that we can replace dp{-, d^^) by dp (•,•; c^m)- 

Theorem 6.1. For metric measure spaces X and Y , we have 

(1) limsnp dp {e§,0jj;d^) < 2dGHp{X,Y). 

(2) //liminf dp(e'^,6'^;d^) < e < 1, then dGHp{X,Y) < e. 

Proof. Suppose e > is arbitrary. For > an integer, let Xjy and Yn be empiri- 
cal spaces determined by choosing N points in each oiX and Y, such that all the 2N 
points are chosen independently. Let AI^ and Mj^ be the corresponding random 
distance matrices. Note that these have distributions 0-^ and 6*^, respectively. 

If N is sufSciently large, by Lemma [5T5| dGHpiXN,X) < 3e with probabil- 
ity at least 1 — e and dGHp{YN,Y) < 3e with probability at least 1 — e. Hence 
dGHp{XM,YM) < dGHp{X,Y) + 6e with probability at least 1 — 2e. By Theo- 
rem gTU d-^{M^,M^) < 2dGHp{XN,YN). It follows that for N sufficiently large 

P{d^{M§,Ml) < 2dGHp{X,Y) + 12e) > 1 - 2e. 

As e > is arbitrary and and Mj^ have distributions 0-^ and 6*^, respec- 
tively, it follows that 

\imsup dp{0§,0^;d^) < 2dGHp{X,Y). 

N^oc 

On the other hand, suppose liminf dp(^?^, 6*^; ^m) < e < 1. Let Ajv, Y/v, M§ 
and Mjj be as before. Then for infinitely many N, P{dT,{M^ , Mjj) < e) > 1 - e. 
By Theorem [4Tl dGHp{XN,YN) < d^{M^,Mji). It follows that, for infinitely 
many A, 

(8) P{dGHp{XN,YN)<e)>l~e. 

Now, let (5 > be arbitrary and pick A so that Equation [8] holds and A is 
sufficiently large so that d{XM:X) < 3S and d{YN,Y) < 3S, with probability at 
least 1 — 6(5. By the triangle inequality and Equation [SJ it follows that 

P{dGHp{X, r) < e + 6(5) > 1 - e - 6(5. 

As (5 > is arbitrary and dGHp{X, Y) is a constant, it follows that dGHp{f) < e if 
e < 1. 

□ 

7. Uniform bounds for sample distances 

One would expect that if metric measure spaces X and Y are close, then the 
distributions 0<^ and 0^ are close for all A, not just for A large. We do show that 
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this is the case. However, somewhat surprisingly, the correspondence is H61der-l/2, 
but not Holder-a for a > 1/2. In particular, the correspondence is not Lipschitz. 

Theorem 7.1. If X and Y are metric, measure spaces with dQHp{X^Y) < 1/4, 
then, for all N eN, 

Proof. Suppose dQHp{X,Y) < e, consider embeddings from X and Y into a space 
Z and a measure v on Z x Z with marginals the pushforwards of the distributions 
of X and Y so that A(i/) < e. For N sufhcently large, let 51 be the product 
measure i'^ on (Z x Z)^ and consider the random variables associating to the 
point u! — {xi,yi, . . . , XN tUn) G {Z x Z)^ the matrices = {d{xi,Xj)) and 
= {d{yi,yj)). The distributions of and Mjj are 9'^ and 6^, respectively. 
For a point lo — (xi,yi, . . . ^xntUn), let B{io) be the cardinality of the set 
{1 < j < : d{xi,yi) > e}. Then B has a binomial distribution with parameters 
N and p, with p the probability that d{xi, yi) > e. Note that p < e. 

Lemma 7.2. We have d„{M^ (uj), M^{uj)) < ma.x{B {lu) / N , 2e) . 

Proof. After permutations we can assume that if d{xi{uj),yi{uj)) > e, then i > 
N — B{lu). The triangle inequality then shows that, for 1 < < A^(l — B{uj)/N), 

\d{xi{uj),Xj{uj)) - d{yi{uj),y-i{uj))\ < 2e. 

By definition of dy^ the claim follows. □ 

We now bound the probability that B{uj)/N is large. 

Lemma 7.3. If B is a Binomial{N,p) random variable with p < e < 1/4, then 

Prob{B > Ne^/^) < e^/^ 

Proof. As -B is a non-negative random variable and E{B) = Np < Ne, the Markov 
inequality gives 

ryn > jve j ^ ^^^^^ ^ ^^^^^ e . 

□ 

If e < 1/4, by Lemma [72] it follows that 

P{dAM§{uj),Ml{uj)) > < e''\ 

which implies that dpiO^ ,9j^;dT^) < e^/^. As this holds for all e > dGHpiX,Y), 
the result follows. □ 

Next, we show that the Holder exponent 1/2 is optimal. 

Theorem 7.4. Let a and C > be real numbers such that 1/2 < a < 1. Then for 
e > sufficiently small, there are spaces X and Y with da hp (X, Y) < e so that 
there is an integer N such that 

dp{e§,e%)>Ce''. 

Proof. Suppose the claimed inequality is violated for some C > and a G (1/2, 1). 
Consider spaces X = {a, b} and Y = {c, d} with d{a, b) = 2C and d{c, d) = 4C. We 
consider the measures on these spaces with weights 1 — e for a and c and e for b and 
d. Then by considering embeddings of X and Y into a space Z with 3 points so 
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that the images of a and c under the respective embeddings coincide, we see that 
dGHp{X,Y) < e. 

To show dp{9^ ,9jj) > Ce", it suffices to show that for ah joint distributions v 
on D(A^) X 3{N) with marginals 0-^ and djf, we have the inequahty 

(9) P{d^{M^,Ml) > Ce") > Ce". 

where the pair {M^ , M^) has distribution 

Assume e < f . Observe that each entry of a distance matrix for points in 
X is either 2C or and for a distance matrix for Y is either 4C or 0. Hence 
the difference between entries of Mx and My is less than Ce" if and only if both 
the entries are zero. 

Pick N so that we have 

(10) i < NCe" < 1. 

If distance matrices Mx and My for X and Y satisfy dT^{Mx, My) < Ce", then 
by the above (as no rows can be omitted) they must both be the zero matrices. In 
particular, 

(11) P{d^{M^,Mjf) > Ce") > P(M# ^ 0). 

We find a lower bound for the right hand side of the above equation. 

Lemma 7.5. If e > is sufficiently small, P{M^ 7^ 0) > Ce". 

Proof. Let xi, X2,. . ., xn be N independent points sampled from X. Let Ai be the 
event Ai — {xi = b}. Then ^ is the event [jf^i Ai \ Hi^^i hence 

N N 

(12) P{M^ ^0)>P{[jA,)-P{f]A,). 

i=l i=l 

Now, the Bonferoni inequality gives 



(13) P([jA,)>f]m)- PiA.nA,) = Ne-^^^ 

i—l i—1 l<i<j<n 

By Equation [TUl we see that 

1 

N > 



2Ce«' 
so that 

(14) Ne > 



2C ■ 

Observe that as 1/2 < a < 1, 1 — a < a < 1. Hence, using Equation [T4l for e 
sufficiently small, we have 

(15) iV.-ffi^>2C." 
Further, note that for e sufficiently small, as a < 1 < iV, 

N 

(16) P(f|A,)=e~<Ce". 

i=l 

Using Equations [T2j [131 HH and [TBI we obtain that for e sufficiently small. 

(17) P{M^ / 0) > Ce" 
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as required. □ 

Using Lemma 17.51 and Equation 111! we obtain Equation [9l completing the proof 
of Theorem [7^ 

□ 

8. Relative entropy of metric measure spaces 

Let {X, d, fi) be a (compact, as always) metric measure space and let xi, . . . , 
be points sampled independently from the measure fj.. Let Xj^ — {xi, . . . ,xn} with 
the induced metric from X and endowed with the measure /i^v = N^^ ^^k- 
Then {XN,d, fij^) is a metric measure space and dQ^pi^N i X) — s- in probability 
by Lemma 15.51 Recall that convergence in probability means P{dQjjp{X]\j , X) > 
e) — > for any e > 0. It is natural to ask whether a large deviation principle holds 
for this convergence. We quickly recall what this means. For more details we refer 
to the comprehensive book by Dembo and Zeitouni [1]. 

Definition 8.1. Let X denote the space of all metric measure spaces endowed 
with the metric doHP- Let Xm^X be as above. Let / : A" — [0, oo] be a lower 
semicontinuous function. We say that a large deviation principle holds for the 
sequence Xm with the rate function / if for any Borel set A (- X with interior A° 
and closure A, we have 

- inf liY) < hminf '^^P^^-^^^ < H^sup '^^^^^-^^^ < _ i,f 

Usually it is desirable that the rate function be good, meaning that the set {Y G 
X : I{Y) < t} is compact in X for every t G [0, oo). 

The same definition can be made for random variables taking values in any 
metric space in place of X (see page 5 of [1]). To understand the meaning of this, let 
{Y, p, u) be any metric measure space and let B be the (5-ball of radius 5 in X around 
Y . If B is disjoint from X, then Lemma lSTSl implies that P{Xj^ € B) 0. However, 
if the large deviation principle holds, then we will have P{Xj^ G -B) w e^^" where 
a — inf^g^ Thus the probability for the sampled metric measure space Xf^ 

to "look like" a space Y decays exponentially fast ioi Y ^ X (if a > 0). 

Apart from the naturalness of the question, another reason for asking for a large 
deviation principle is that if it holds, the rate function IxiY) will be something 
that may be called the relative entropy of the metric measure space Y with respect 
to X. To illustrate this point, we recall the well-known theorem of Sanov. 

Example 8.2. Let X and Xn be as above, but now we only consider the conver- 
gence of II N to II in the space of probability measures on X . By Prohorov's theorem, 
the space of probability measures on X is a compact metric space. Sanov 's theorem 
(see page 263 of ^) asserts that satisfy a large deviation principle with the good 
rate function 

If log 4^ diy if v is absolutely continuous to ri. 
I oo otherwise. 

often denoted D{v\ii), is called the relative entropy or the Kullback-Liebler 
divergence of v with respect to ii. For example, if n = \5q + and v ~ pSa + 
il~p)6i, then D{iy\\n) = log2 + plogp + {1 - p)logil ~ p). 
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Our question is to find the corresponding rate function for the convergence of 
Xn to X. A detailed proof would be long and technical, but we just state what we 
expect to be the rate function and give a heuristic reason. 

Definition 8.3. For two metric measure spaces {X,d,ii) and iy,p,v), define the 
"relative entropy" of Y with respect to X by 

Ix{X) = inf {D{i*i'\ij) : t is an isometric embedding of Y into X} . 

Then, we expect that the large deviation principle holds for the convergence of 

Xjv to X with the rate function Ix{')- The reason is simply as follows. 

Consider any embedding l -.Y ^ X. The probability that the empirical measure 
yUjv falls in Bs{i*v) (the ball of radius i5 around /x in the space of probability measures 
on X) with a probability of about cxp{— A^_D(/,*i^||/i)}, by Sanov's theorem cited 
earlier. Since we can choose any embedding, the one with the maximum probability 
should dominate, giving rise to the definition of Ix{Y). 

It is not our intention to give a detailed proof of this statement here, but to 
introduce at the quantity IxiY), which may be called the relative entropy of the 
metric measure space Y with respect to X. 
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